Graphics processing unit for simulation or medical diagnostic imaging

ABSTRACT

Methods and systems provide simulation or medical diagnostic imaging with a graphics processing unit. Data to be processed by a graphics processing unit is transferred from a source to the graphics processing unit without copying by the central processing unit. For example, the central processing unit does not copy data to the cache. Instead, the source of data transfers the data directly to the graphics processing unit or directly to a graphics aperture region of a memory for transfer to the video memory of the GPU. The GPU is then used to generate a two-dimensional or three-dimensional image. The GPU is used to perform a medical imaging process, such as an ultrasound imaging process. The processed data is transferred to a different processor. Since the GPU provides various parallel processors, the GPU may more efficiently perform image processes different from rendering a two-dimensional or three-dimensional image.

BACKGROUND

The present invention relates to a graphics processing unit. Loading andprocessing data with the graphics processing unit (GPU) is controlled.

GPUs are provided as videocards on personal computers. Using the AGPspecification, a central processing unit (CPU) coordinates the transferof data from a random access memory to the GPU for video rendering. Amemory control hub is connected by various buses to each of a source, aRAM, the CPU and the GPU. For example, an AGP chip set is used as amemory control hub. The memory control hub controls the data transfersbetween any of the various interconnected devices. The data is obtainedfrom the source, such as a CD, diskette, or hard drive. The data fromthe source is routed to a random access memory (RAM). The CPU thencopies the data from the random access memory into the CPU cache memory.For use of the GPU, the CPU copies the data to a graphics apertureregion of the RAM controlled pursuant to a graphics aperture resourcetable (GART). Prior to copying the data to the graphics aperture region,the CPU may also reformat the data. This is because the GPU expects thedata to be in a particular format in order to deliver maximumthroughput. The data from the graphics aperture region is thentransferred through an accelerated graphics port (AGP) to the videomemory of the GPU. The GPU then performs various rendering or videoprocessing and outputs a resulting image to a display. Pursuant to anapplication programming interface (API), the CPU controls operation ofthe GPU.

Since the CPU copies the data from the RAM to the graphics apertureregion of the RAM, the data is copied multiple times. Any loading on theCPU for other processing may delay the transfer of data to the GPU.Since the CPU operations may be interrupted, the transfer of data to theGPU is inconsistent or non-deterministic.

The AGP hardware discussed above may be used in a medical diagnosticultrasound system, such as disclosed in U.S. Pat. No. 6,358,204, thedisclosure of which is incorporated herein by reference. The memorycontrol hub connects a CPU to a memory. The two other ports of thememory control hub are connected to two different buses, one a systembus and another an ultrasound data bus. The ultrasound data bus connectsto a source of ultrasound data as well as outputs of ultrasound data,such a scan converter connected with the display.

BRIEF SUMMARY

By way of introduction, the preferred embodiments described belowinclude methods and systems for simulation or medical diagnostic imagingwith a graphics processing unit. In one embodiment, data to be processedby a graphics processing unit is transferred from a source to thegraphics processing unit without copying by the central processing unit.For example, the central processing unit does not copy data to thecache. Instead, the source of data transfers the data directly to thegraphics processing unit or directly to a graphics aperture region of amemory for transfer to the video memory of the GPU. The GPU is then usedto generate a two-dimensional or three-dimensional image.

In another embodiment, the GPU is used to perform a medical imagingprocess, such as an ultrasound imaging process. The processed data istransferred to a different processor. Since the GPU provides variousparallel processors, the GPU may more efficiently perform dataprocessing different from rendering a two-dimensional orthree-dimensional image.

In a first aspect, a graphics processing unit system is provided fordiagnostic medical ultrasound imaging. A graphics processing unit has aninput and at least one output. The graphics processing unit is operableto process ultrasound data from the input. A processor connects with theoutput of the graphics processing unit. The processor is operable toprocess ultrasound data output on the output of the graphics processingunit.

In a second aspect, a method for diagnostic medical ultrasound imagingwith a graphics processing unit is provided. Ultrasound data isprocessed with the graphics processing unit. Ultrasound data output fromthe graphics processing unit is then further processed with a differentprocessor prior to generating a display responsive to the ultrasounddata. A display responsive to the ultrasound data is then generated.

In a third aspect, an improvement in a method for loading a video memoryof a graphics processing unit is provided. A central processing unitinteracts with a memory, such as a RAM memory, and the graphicsprocessing unit. In the improvement, data is loaded into the videomemory without storing the data in a cache of the central processingunit.

In a fourth aspect, a system for loading a video memory of a graphicsprocessing unit is provided. A central processing unit connects with thegraphics processing unit. The central processing unit is operable to runan application programming interface of the graphics processing unit. Asource of data connects with the graphics processing unit. The data istransferable from the source to the video memory without copying of thedata by the central processing unit.

The present invention is defined by the claims, and nothing in thissection should be taken as a limitation on those claims. Further aspectsand advantages of the invention are discussed below in conjunction withthe preferred embodiments.

BRIEF DESCRIPTION OF THE DRAWINGS

The components and the figures are not necessarily to scale, emphasisinstead being placed upon illustrating the principles of the invention.Moreover, in the figures, like reference numerals designatecorresponding parts throughout the different views.

FIG. 1 is a block diagram of one embodiment of a system for loading avideo memory of a graphics processing unit;

FIG. 2 is a flow chart diagram of one embodiment of a method for loadinga video memory of a graphics processing unit;

FIG. 3 is a block diagram of one embodiment of a graphics processingunit and interconnected processor; and

FIG. 4 is a flow chart diagram of one embodiment of a method forprocessing in diagnostic medical ultrasound data with a graphicsprocessing unit.

DETAILED DESCRIPTION OF THE DRAWINGS AND PRESENTLY PREFERRED EMBODIMENTS

In one aspect, the routing of data for loading into a video memory of aGPU is controlled. In another aspect, the GPU performs image processingdifferent than two- or three-dimensional rendering of an image. Inanother aspect, the GPU performs general mathematical computations. Inyet another aspect, the GPU performs two or three-dimensional renderingsof an image. In another aspect, a combination of the two aspectsdiscussed above is provided. The different aspects may be usedindependently or separately in other embodiments. Immediately below,embodiments directed to loading data into the video memory are provided.Subsequently, embodiments directed to performing different processeswith the GPU are provided.

FIG. 1 shows one embodiment of a system 10 for loading a video memory 12of a GPU 14. A memory control hub 16 interconnects the GPU 14 with theCPU 18, a memory 20 and a source of data 22. Additional, different orfewer components may be provided. For example, the GPU 14 connects tothe source 22 without one or more of the memory control hub 16, the CPU18 and the memory 20. As another example, an additional componentconnects to the memory control hub 16. The system 10 is a systemconfigured pursuant to the AGP specification, but may be configuredpursuant to different specifications, such as PCI, PCI-X, PCI Express,or arrangements with or without any of the various components. In oneembodiment, the system 10 is a personal computer for generatinggraphical images, such as simulations. The system 10 may also be used asa work station for generating graphical images from data representing anobject, such as a scanned picture. In yet another embodiment, the system10 is a medical imaging system, such as an X-ray, MRI, computertomography, diagnostic ultrasound or other now known or later developedmedical imaging system.

The GPU 14 is a processor, circuit, application specific integratedcircuit, digital signal processor, video card, combinations thereof orother now known or later developed device for graphics processing. Inone embodiment, the GPU 14 is a graphics processor or video cardprovided by nVIDIA, ATI or Matrox. These or other devices using an APIof OpenGL, DirectX or other now known or later developed APIs may beused. In one embodiment, the GPU 14 includes one or more vertexprocessors, such as 16 vertex processors, and one or more fragmentprocessors, such as 64 fragment processing units. Other analog ordigital devices may also be included, such as rasterization andinterpolation circuits. One or more frame buffers may be provided foroutputting data to a display. The GPU 14 receives data in one or moreformats and generates 2 or 3 dimensional images based on the data, suchas by performing texture mapping or other 2 or 3 dimensional rendering.For example, the data received represents various objects withassociated spatial relationships. The GPU 14 is operable to determinethe relative positioning of the data and generate fragments representingdata visible from a particular viewing direction. GPU 14 is operable todecompress data, so that the bandwidth of data transferred to the GPU 14is maximized through compression. Alternatively, uncompressed data istransferred to the GPU 14.

The GPU 14 includes the video memory 12. In one embodiment, the videomemory 12 is a random access memory, but other now known or laterdeveloped memories may be used. The video memory 12 stores any ofvarious amounts of information, such as 64, 128, 256 or other number ofkilobytes. The GPU 14 accesses information from the video memory 12 forgraphics processing. Graphics processing is performed pursuant to theAPI run by the CPU 18.

The CPU 18 is a general processor, application specific integratedcircuit, dedicated processor, digital signal processor, digital circuit,analog circuit, combinations thereof or other now know or laterdeveloped processing device. In one embodiment, the central processingunit 18 is a processor operable to control a system pursuant to the AGPspecification. In alternative embodiments, processors operating pursuantto the same or different specifications may be provided. In oneembodiment, the CPU 18 is configured in a parallel processingarrangement, such as including two or more processors for controlling orprocessing data. Any various or now known or later developed processorsmay be used. The CPU 18 connects with the GPU 14 for running anapplication programming interface of the GPU 14. The CPU 18 providesinstructions pursuant to the API for controlling the graphics rendering.The CPU 18 implements a driver for the GPU 14 operable to acceptpre-formatted data without processing by the CPU 18. The CPU 18 alsocontrols the memory control hub 16 and associated memory 20. In oneembodiment, the CPU 18 controls or processes data from the source 22.Alternatively, the source 22 operates independently of the CPU 18.

The memory 20 is a random access memory, such as arranged in one, two ormore different chips or chip sets. Other now known or later developedmemories may be used. The memory 20 is connected with the CPU 18, suchas through the memory control hub 16 for allowing the CPU 18 access tothe memory 20. The memory 20 is controlled by the CPU 18. In oneembodiment, the memory 20 has a common address scheme accessible by thememory control hub 16 or the CPU 18. A section or group of addresses ofthe memory 20 is assigned as a graphics aperture region. The addressesassociated with the graphics aperture region identify addresses for datato be transferred to the video memory 12. The graphics aperture regionis generally not accessible for uses other than transfer of data to theGPU 14. In one embodiment, the size of the graphics aperture regionmatches the size of the video memory 12. In alternative embodiments, thegraphics aperture region is accessible for other uses or may be adifferent size than the video memory 12. The GPU 14, the memory controlhub 16 or the CPU 18 causes data stored in the graphics aperture regionto be transferred or copied to the video memory 12. In one embodiment,the graphics aperture region and common address scheme are configured aspart of a contiguous memory controlled by a Graphics Address Re-mappingTable (GART) for controlling access to the memory 20.

In one embodiment, the graphics aperture region is operable to slide orchange memory addresses in an address loop. The addresses of thegraphics aperture region slide within the memory 20 such that the startand end memory locations of the graphics aperture region can beincremented or decremented within the region. When the upper or lowerends of the graphics aperture region are reached, the address is shiftedto the opposing lower or upper end respectively in a circular fashion.As a result, a memory loop of graphics data is provided within thegraphics aperture region. In alternative embodiments, the memory 20 isdivided up into separately accessible sections or includes multipledevices that are separately controlled and accessed.

The source 22 is a database, memory, sensor, CD, disk drive, hard drive,tape, tape reader, modem, computer network, or other now known or laterdeveloped source of graphics data. In one embodiment, the source 22 is aprogram or software operated by the CPU 18 to generate a graphicssimulation. In another embodiment, the source of data 22 is a medicalsensor, such as an x-ray, MRI, computer tomography or medical diagnosticultrasound scanner. Medical diagnostic imaging data is provided by thesource 22. For example, the source 22 is an ultrasound beamformeroperable to receive acoustic echoes representing a patient. Ultrasounddata includes in-phase and quadrature data from a beamformer andSpectral Doppler data. The ultrasound beamformer generates one or moredata samples representing different spatial locations along a pluralityof beams of a scan of a patient. Frames of data representing each two-or three-dimensional region are then output. The ultrasound data isdetected using any of various detectors, such as B-mode, Doppler,harmonic, contrast agent or other now known or later developeddetectors. In one embodiment, the ultrasound beamformer provides thedata prior to beamforming, ultrasound data prior to detection, orultrasound data after detection. The source of data 22 is connected withthe GPU 14 either directly or through one or more devices as shown inFIG. 1.

In one embodiment, the source 22 includes a processor, a buffer, orformatter for configuring the data. For example, a buffer and processorare used with an ultrasound beamformer for GPU-specific formatting oftexture data acquired in a polar coordinate format intothree-dimensional texture rendering by the GPU 14. In one embodiment,the GPU 14 uses a format for three-dimensional texturing to optimizememory access speeds. The data is arranged in an order to provide aGPU-specific format for data transfer. In alternative embodiments, theGPU 14 includes a buffer, processor or formatter for GPU-specificformatting of the data. For 3D texture or other data, different formats,may be used for the data provided from the source 22 or for the dataused by the GPU 14.

In one embodiment, the source 22 is operable to provide datarepresenting a three-dimensional volume. For example, an ultrasoundmedical sensor and associated beamformer are operable to scan a patientin a three-dimensional or volume region. The scan is performed using amechanically-moved or multi-dimensional transducer array to scan avolume by firing a plurality of ultrasound lines. Ultrasound data isthen provided in sets of data representing a three-dimensional volume.More than one set may represent the same volume at a same time, such asproviding a Doppler set and a B-mode set of data. For four-dimensionalimaging, a plurality of sets representing the volume at different timesis provided. As another example, two sets of data, processeddifferently, are used. Processing includes filtering. Spatial, frequencyor other filtering may be provided for processing the data. Oneprocessed set of data is used for three-dimensional volume rendering.The other processed set of data is used for generating two-dimensionalrepresentations or slices of the volume. In one embodiment using agraphics aperture region for four-dimensional volume rendering, oneprocessed set of data is overwritten as each set is acquired to conservememory space. The other processed set of data is maintained throughout atime period for later processing, three-dimensional rendering ortwo-dimensional imaging. Alternatively, both processed data sets arestored representing volumes at multiple times. In one embodiment, setsof data are maintained until the extent of the graphics aperture regionhas been used. The addresses are then looped back to data representingthe earliest set, and the more recently acquired data is overwritten ina CINE loop fashion.

The memory control hub 16 is a processor, a bus, an application specificintegrated circuit, an AGP chip set, an AGP controller, combinationthereof or other now known or later developed device for interfacingbetween two or more of the GPU 14, CPU 18, memory 20 and the source 22.In one embodiment, a single device is provided for the interface, suchas a single circuit or chip, but multiple devices may be provided in anyof various possible architectures for transferring data between any twoor more of the devices connected with the memory control hub 16. In yetalternative embodiments, the various devices directly connect through asingle data bus or with each other without the memory control hub 16.Memory control hub 16 connects with the GPU 14 with an acceleratedgraphics bus, connects with the CPU 18 with a host or front side bus,connects with the memory 20 with a memory bus, and connects with thesource 22 with a PCI-X or PCI acquisition bus. Different buses or signallines may be used for any of the various connections discussed above,including now known or later developed connections.

Rather than routing the data from the source 22 to the memory 20, thenthrough the CPU 18 to the graphics aperture region of the memory 20, andfinally from the graphics aperture region to the video memory 12, thedata is routed from the source 22 to the graphics aperture region or tothe video memory 12 without copying or loading of the data by the CPU18. The data from the source 22 is routed using a driver, software orother control implemented by the memory control hub 16, the CPU 18, theGPU 14 or another device. The data from the source 22 is operable toroute to the video memory 12 through the memory control hub 16 withoutpassing to the CPU 18. For example, the data for processing by the GPU14 is not stored in the cache memory of the CPU 18.

In one embodiment, the data from the source 22 is operable to be routedto the video memory 12 from the source 22 through the graphics apertureregion of the memory 20 without passing to the CPU 18. The data iswritten to the memory 20 directly into the graphics aperture region fortransfer or copying to the video memory 12 by the memory control hub 16.In another embodiment, the data is operable to be routed to the videomemory 12 from the source 22 without passing to the CPU 18 or theassociated memory 20. The memory 20, including the graphics apertureregion, is avoided by directly routing the data from the source 22 tothe video memory 12.

FIG. 2 shows a method for loading a video memory of a graphicsprocessing unit using the system 10 of FIG. 1 or another system. The CPUinteracts with a memory separate from the GPU. An improvement isprovided by loading data into the video memory without storing the datain a cache of the CPU in act 24. As represented by the disconnectbetween acts 24 and acts 26, the CPU does not load the data from thesource for the GPU into the cache of the CPU. The data is provided tothe GPU 14 without copying by the CPU, such as without copying from onelocation of the memory 20 to a graphics aperture region of the memory.

In response to user selections or otherwise configuring the system, theCPU begins an application program using the GPU. For example, a userselects three-dimensional or four-dimensional imaging. The CPU theninstructs the GPU to become the bus-master and download data from thegraphics aperture region or the source. The video memory is loadedwithout processing the data by the CPU during the transfer. The data istransferred without GPU-specific formatting (e.g., swizzling) or copyingby the CPU. The CPU performs control aspects of the data transfer bysignaling the GPU and/or other devices.

In act 28, the data is transferred to a graphics aperture region of thememory associated with the CPU, such as a RAM. For example, ultrasounddata is written from a beamformer or other medical sensor into thegraphics aperture region of the memory of FIG. 1. The source writes thedata directly into the graphics aperture region. In one embodiment, thedata is formatted for use by the GPU and output from the source. As aresult, the GPU-specific formatting by the CPU for three-dimensionaltexture data is avoided. The source performs any GPU-specificformatting. Alternatively, the data is provided to the graphics apertureregion without a particular format for the GPU. In this example, theGPU-specific formatting is performed by the GPU after transfer to theGPU.

The data written into the graphics aperture region is transferred to thevideo memory without processing or copying of the data by the CPU. Forexample, the GPU acquires control of the bus or a portion of the busconnected with the memory having the graphics aperture region (i.e., GPUbus-masters.). The GPU then downloads the data from the graphicsaperture region into the video memory. Alternatively, the CPU, thememory control hub, the source or another device controls one or morebuses to cause the transfer of the data to the graphics aperture region.While the CPU is operable to run an application programming interfacefor controlling the GPU, the CPU operates free of copying data betweendifferent locations of the memory for transfer to the video memory.

In one embodiment, the source writes the data to the graphics apertureregion where the graphics aperture region slides by using an addressloop as discussed above. All or a portion of the graphics apertureregion uses the looping address structure to allow one type of data orall of the data to be configured in a loop fashion for representing avolume or area at different times.

As an alternative to act 28, the source writes the data to the videomemory in act 30, such as transferring the data to the video memorywithout storing the data in a graphics aperture region. Based on controlsignals from the CPU or other device, the GPU controls the transfer orotherwise acquires the data from the source. Alternatively, the source,the memory control hub, or the CPU controls one or more buses forcausing the data to be written to the video memory.

In alternative embodiments, some data output by the source is directedto the video memory without transfer to the graphics aperture regionwhile other data is transferred to the video memory through the graphicsaperture region. In yet other alternative embodiments, a subset of datamay be copied by the CPU, stored in the cache of the CPU or otherwiseprocessed by the CPU as part of the transfer to the video memory.CINE-buffering or otherwise providing storage of differentrepresentations of the same volume at different times is provided in thevideo memory or as a function of the timing of the output of data fromthe source.

In either of the embodiments of acts 28 or 30, the data is formatted forthe GPU without processing by the CPU. Any of various formats may beprovided. In one embodiment, the formatting includes compression of thedata prior to the transfer to the video memory. After the transfer tothe video memory, the data is decompressed with the GPU. Any of lossy orlossless compression schemes now known or later developed may be used,such as texture compression.

By transferring data to the GPU without copying by the CPU, the transferspeed between the source and the GPU may be increased. Increasedtransfer speed may allow for increased volume rendering rates fromthree-dimensional or four-dimensional imaging. Any interrupts or otherprocessing performed by the CPU may not delay the transfer of data tothe GPU. Windows or other operating system latencies may have no orminimal affect on the volume rendering by the GPU. Increased volumerendering rates due to increased data transfer rates may allow forfour-dimensional cardiology volume rendering. Overlapping pipelinesequences for transferring data or other operations to increase paralleltransfers of data may also increase transfer rates.

FIG. 3 shows a graphics processing unit system 32 for diagnostic medicalultrasound imaging. The system 32 includes a GPU 14 in the configurationdiscussed above for FIG. 1, in a configuration disclosed in any one ofU.S. Pat. Nos. ______, and ______ (application Ser. Nos. 10/644,363, and10/388,128), the disclosures of which are incorporated herein byreference, or other GPUs provided any where in medical diagnosticultrasound systems (e.g., a system with an ultrasound transducer or aworkstation for processing ultrasound data).

The GPU 34 includes a programmable vertex processor 36, a primitiveassembly processor 38, a rasterization and interpolation processor 40, aprogrammable fragment processor 42 and a frame buffer 44. Additional,different or fewer components may be provided. Any of the processors ofthe GPU 34 are general processors, digital circuits, analog circuits,application specific integrated circuits, digital processors, graphicsaccelerator card, display card or other devices now known or laterdeveloped. In one embodiment, the GPU 34 is implemented as a series ofdiscreet devices on a mother board or as a daughter board, but may beimplemented as a single chip, a circuit on a card or other layout. Theprogrammable vertex processor 36 is a group of 16 parallel processingunits in one embodiment, but fewer or greater number of processors maybe provided. The fragment processor 42 is a parallel arrangement of 64processing units in one embodiment, but more or fewer processing unitsmay be provided.

FIG. 3 shows the graphics processing pipeline standardized by APIs suchas OpenGL and DirectX. The GPU 34 includes a programmable vertexprocessor 36, a primitive assembly 38, a rasterization and interpolationblock 40, a programmable fragment processor 42 and a frame-buffer 44.The input to the vertex processor 36 is a set of vertices in two- orthree-dimensional space. Each vertex has a set of attributes such ascoordinates, color, texture coordinates, etc. The vertex processor 36transforms the coordinates of the vertices into a frame of reference.The output of the vertex processor 36 is a set of vertices with newattributes changed by the vertex processor 36. These vertices are fedinto the next stage, the primitive assembly 38. Here, the vertices aregrouped together to form points, lines and triangles. These primitivesare then fed into the rasterization and interpolation stage 40. Thisstage rasterizes each primitive, such as points, lines and triangles,into a set of fragments. A fragment is a pixel with a depth associatedwith it and is located on a primitive. The fragments have attributessuch as color, coordinates and texture coordinates, etc. The next stage,programmable fragment processor 42 takes in these fragments, appliesvarious processes on them, and creates pixels. The pixels haveattributes, such as color, and are written into the final stage, theframe-buffer 44. Other now known or later developed structures andprocesses may be used in the graphics pipeline for graphics rendering.The blocks shown in FIG. 3 are high level blocks. Each block containsmany other finer processing stages. For example, the rasterization andinterpolation stage 40 can contain such operations such as Scissor Test,Alpha Test, Stencil Test, Depth Test, etc. The frame buffer 44 is amemory, buffer or other device for receiving the pixels from thefragment processor 42 for display on the display 46.

The GPU 34 is operable to receive graphics data and generate a displayon the display 46 from the graphics data. The process is performedpursuant to an application programming interface, such as GDI, GDI+,DirectX, OpenGL, or other APIs now know or later developed. Additionallyor alternatively, the GPU 34 is used to process ultrasound data forother purposes than this immediate display. For example, in-phase andquadrature data, post detection data, log compressed data, scanconverted or any other ultrasonic data is input to the GPU 34. Using theprogrammable vertex processor 36 and/or the fragment processor 42, theultrasound data is processed. OpenGL, DirectX extensions or otherprogramming languages, such as Cg shader language, program the GPU 34 toprocess ultrasound data. HLSL, Stanford's high-level shader language orother now known or later developed shader languages may also be used.Some resource intensive computations are performed by the GPU 34 ratherthan another processor, such as a CPU, DSP, ASIC or FPGA. Since the GPU34 functions as a computational engine, one or more additional outputsare provided. For example, an output is provided downstream of theprogrammable vertex processor 36 but upstream of the fragment processor42. As an alternative or additional output, an output is provided afterthe fragment processor 42. Alternatively, the output from the framebuffer is used.

Either or both of the vertex processor 36 and fragment processor 42 areprogrammed to perform ultrasound data processing. For example, thevertex processor 36 is programmed or operable to perform scan conversionoperations. Using the vector or matrix type polar coordinate data, thevertex processor reformats each special location into a formatappropriate for a display. As another example, the fragment processor 42is operable to perform Fourier transforms or non-linear scan conversionoperations. Scan converted ultrasound data output by the vertexprocessor 36 is provided to the programmable fragment processor 42 fornon-linear operations through interpolation or other fragment processes.

In one embodiment, the GPU 34 is operable to process ultrasound data andprovide the processed data to a different image processor 48. The imageprocessor 48 provides data for the display 46 or routes data back to theGPU 34 for rendering to the display 46.

The processor 48 is a general processor, applications specificintegrated circuit, digital signal processor, image processor, FPGA,CPU, analog circuit, digital circuit, combinations thereof or other nowknown or later developed device for processing ultrasound data. Theprocessor 48 is operable to process ultrasound data output by the GPU34. For example, the processor 48 and GPU 34 are provided as part of anultrasound data path beginning at a beamformer and ending at the display46. The GPU 34 implements at least a part of one ultrasound process,such as receive beamformation, scan conversion, motion detection, otherultrasound process or combinations thereof. The processor 48 implementsat least part of a same or different ultrasound process, such asdetection, motion tracking, beamforming, filtering, scan conversion,other ultrasound process, or combinations thereof. The vertex processor36 and the fragment processor 42 have independent instruction setsassembled by the shader language or other programming interface.Ultrasonic data sent to the GPU 34 is processed by the vertex processorand/or fragment processor 42 for implementing ultrasound imagingprocesses.

Since the GPU 34 may be less likely to be interrupted than a centralprocessing unit or other processors, the GPU 34 may provide more consistor reliable image processing. While a clock rate is lower or higher,even lower clock rates may provide faster image processing given theparallel processing provided the GPU 34. The GPU 34 is capable ofcarrying out a large number of floating point parallel computations.

FIG. 4 shows a method for diagnostic medical ultrasound imaging with agraphics processing unit. In act 56, a GPU processes ultrasound data.For example, at least a part of an ultrasound process of receivebeamformation, scan conversion, motion detection or other ultrasoundprocesses are performed by the GPU. In one embodiment, the programmablevertex processor 36 and/or programmable fragment processor 42 are usedto perform the ultrasound process. GPUs have optimized architectures forvector and matrix data types. Vector and matrix processes are used forultrasonic data processing, such as receive beamformation, scanconversion, motion tracking or correlation processes.

For example, receive beamformation is performed by combining data from aplurality of different channels, such as 128 channels over time. The GPUalters the relative position along a temporal dimension of the dataacross the channels with the vertex processor, and weights the data andcombines the data associated with a particular location across the 128or other number of channels with the fragment processor. As anotherexample for receive beamformation, a fast Fourier transform and aninverse Fourier transformed are used for receive beamformation. Thevertex processor passes the ultrasound data to the fragment processor.Fragment processor identifies a fragment value, finds a neighbor, andcombines the fragments with a weight. Using feedback to the input of theprogrammable fragment processor 42 or the input of the GPU, an iterativeprocess is implemented to provide the fast Fourier transform. The datais then exported for further receive beamformation processing by theprocessor 48. Alternatively, the GPU 34 performs the further process.The GPU 34 is then used to apply an inverse fast Fourier transform. Theinverse data represents beamformed data or plurality of samplesrepresenting different locations within a patient. The Fourier andinverse Fourier transforms implemented by the GPU 34 are described inU.S. Pat. No. ______ (Application Ser. No. ______) (Attorney referenceno. 2001P20912US)), the disclosure which is incorporated herein byreference.

As another example, the vertex processor reformats data from a polarcoordinate system unto a Cartesian coordinate system. For example, ascan conversion is implemented by assigning coordinates associated withcurrent data as a function of the new format. A linear interpolation bythe rasterization and interpolation processor completes the scanconversion. For non-linear scan conversion processes, the fragmentprocessor 42 implements the non-linear function.

As another example, motion tracking or motion detection usingcorrelation or other processes is performed by the GPU 34. Any of thevertex processor or fragment processor may be used for implementingcorrelation functions, such as cross correlation or minimum sum ofabsolute differences. Other ultrasound processes may be performed withthe GPU 34. Depending on the component of the GPU 34 implementing theprocess, any of the various outputs to the image processor 48 may beused.

In act 58, the ultrasound data output from the GPU 34 is processed witha different processor, the additional processing is performed prior togenerating a display responsive to the data input to the GPU 34. Forexample, the processor 48 implements filtering, detection, scanconversion, beamformation, motion detection or another ultrasoundprocess. The data output by the processor 48 is provided to the display46 either through or without passing through the GPU 34. For example,the GPU 34 processes data for a three dimensional representation usingan ultrasound process. The data is then provided to the processor 48and/or additional devices for further ultrasound processing, such asfiltering. The ultrasound data is then provided back to the GPU 34 forgraphics processing and output to the display 46.

In act 60, the ultrasound data is output to the display. The display isresponsive to the ultrasound data processed one or more times by the GPU34 and another or different processor. The GPU has multiple programmableprocessors while being relatively cheap. The large parallel processingcapability is less susceptible to interrupts than processors operatingpursuant to an operating system. Using high-level languages, the GPU isprogrammed to perform an ultrasound process.

While the invention has been described above by reference to variousembodiments, it should be understood that any changes and modificationscan be made without departing from the scope of the invention. Forexample, the GPU implements only graphics processing or may implementthe ultrasound processing as well as graphics processing. As anotherexample, the GPU for implementing ultrasound processing is provided in asystem different than described above for FIG. 1. Similarly, the systemdescribed for FIG. 1 uses the GPU for graphics processing or othervolume rendering of ultrasound or non-ultrasound data without ultrasoundprocessing by the GPU. The drivers or software may be adapted for usewith reprogrammable processors or GPU, such as provided by reprogrammingan FPGA during use or by service personal.

It is therefore intended that the foregoing detailed description beregarded as illustrative rather than limiting, and that it be understoodthat it is the following claims including all equivalents, that areintended to define the spirit and the scope of this invention.

1. (canceled)
 2. (canceled)
 3. (canceled)
 4. (canceled)
 5. (canceled) 6.(canceled)
 7. (canceled)
 8. (canceled)
 9. (canceled)
 10. (canceled) 11.(canceled)
 12. (canceled)
 13. (canceled)
 14. (canceled)
 15. (canceled)16. In a method for loading a video memory of a graphics processing unitwhere a central processing unit interacts with a second memory and thegraphics processing unit, an improvement comprising: (a) loading datapre-formatted into a format expected by the graphics processing unitinto the video memory without storing the data in a cache of the centralprocessing unit.
 17. The method of claim 16 wherein (a) comprises: (a1)transferring the data to a graphics aperture region of the second memoryfrom a source of data; and (a2) transferring the data from the graphicsaperture region to the video memory without processing of the data bythe central processing unit.
 18. The method of claim 17 wherein (a1)comprises writing ultrasound data from a beamformer into the graphicsaperture region.
 19. The method of claim 17 further comprising: (b)sliding the graphics aperture region of the second memory in an addressloop.
 20. The method of claim 16 wherein (a) comprises transferring thedata to the video memory without storing the data in a graphics apertureregion.
 21. The method of claim 16 wherein the central processing unitis operable to run an application programming interface for the graphicsprocessing unit and operable to operate free of copying data betweendifferent locations of the second memory for transfer to the videomemory, the second memory being a random access memory accessible to thecentral processing unit through a hub, the video memory connectable tothe second memory through an accelerated graphics port of the hub. 22.The method of claim 16 wherein the data is formatted for the graphicsprocessing unit without processing by the central processing unit. 23.The method of claim 16 further comprising: (b) compressing the dataprior to (a); and (c) decompressing the data after (a) with the graphicsprocessing unit.
 24. A system for loading a video memory of a graphicsprocessing unit, the system comprising: a central processing unitconnected with the graphics processing unit, the central processing unitoperable to run an application programming interface of the graphicsprocessing unit; a source of data connected with the graphics processingunit; a first memory connected with the graphics processing unit and thecentral processing unit; and a memory control hub connected with thecentral processing unit, a video memory of the graphics processing unit,the source and the first memory; wherein data is transferable from thesource to the video memory without copying of the data by the centralprocessing unit.
 25. The system of claim 24 wherein the memory controlhub is operable to route the data from the source to the video memorythrough the memory control hub without passing to the central processingunit.
 26. The system of claim 24 wherein the first memory has a graphicsaperture region connected with the central processing unit, the datafrom the source operable to route to the video memory from the sourcethrough the graphics aperture region without passing to the centralprocessing unit.
 27. The system of claim 26 wherein the graphicsaperture region is operable to slide in an address loop.
 28. The systemof claim 26 wherein the memory control hub connects with the graphicsprocessing unit with an accelerated graphics bus, connects with thecentral processing unit with a host bus, and connects with the secondmemory with a memory bus.
 29. The system of claim 24 wherein the firstmemory connects with the central processing unit, the data from thesource operable to route to the video memory from the source withoutpassing to the central processing unit and without passing to the firstmemory, the first memory being a random access memory of the centralprocessing unit.
 30. The system of claim 24 wherein the centralprocessing unit includes a cache memory, the data transferring to thevideo memory without storing the data in the cache memory.
 31. Thesystem of claim 24 wherein the source of data comprises a medicalsensor, the data being medical diagnostic imaging data.
 32. The systemof claim 24 wherein the source of data comprises an ultrasoundbeamformer, the data being ultrasound data.
 33. The system of claim 24wherein the source of data is operable to format the data for thegraphics processing unit without processing by the central processingunit.
 34. The system of claim 24 wherein the data comprises compresseddata and the graphic processing unit is operable to decompress the data.